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Abstract — In this paper we study certain properties of Renyi 
entropy functions H a (P) on the space of discrete probability 
distributions with infinitely many probability masses. We prove 
some properties that parallel those known in the finite case. 
Some properties on the other hand are quite different in the 
infinite case, for example the (dis)continuity in V and the 
problem of divergence and behaviour of H a (V) at the point 
of divergence. Finally, we prove that, given a sequence of distri- 
butions V n converging to V with respect to the total variation 
distance, lima.=.i + lim^oo H a (P„) is in general not equal to 
lirrin-nx, limo,-^-)- H a (V n ), so interchanging limiting operations 
(which is often done in applications) is not justified in this case. 

Index Terms — Entropy, Renyi entropy, infinite alphabet. 



I. Introduction 

RENYI entropies are a family of functions introduced HI, 
J2) on axiomatic grounds as a generalization of Shannon 
entropy and have since then found a number of applications 
in information and coding theory ||4), 10, (6), statistical 
physics [7|, [8|, multifractal systems [9| etc. 

For a probability distribution V = (pi, . . . , Pn) Renyi 
entropy of order a, a > 0, is defined as 



1 — a ' 

n—l 

where it is understood that JT| 

Hx(P) = UmH a (V) 

a— f 1 
N 

71=1 

= H(V) 



(1) 



(2) 



which is precisely the Shannon entropy of V. The base of 
the logarithm in {T), b > 1, is arbitrary and will not be 
specified. In most textbooks [10] and papers, properties of 
Renyi entropy for discrete probability distributions are stated 
and proven only in the finite case. These quantities, however, 
are frequently used when the number of possible outcomes 
is infinite, namely, in statistical mechanics where systems 
with an infinite number of particles are often considered 
ifTTl . in Markov chains with an infinite number of states, 
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information sources with an infinite number of symbols IfTZIl 
etc. And of course, it is important in itself to generalize 
and extend scientific concepts whenever possible because that 
usually leads to better understanding and a broader view of 
the corresponding theory. So, for a probability distribution 
V = (pi,P2,...), define D3 

\-~2^n=\Vn\Ogp n , Ot=l 

Our aim here is to prove some important properties of these 
functions. 

Recently, Ho and Yeung 0141 . [15| proved some nice 
and somewhat surprising properties of Shannon information 
measures over infinite alphabets. Our findings continue this 
line of research and give some new insights into the general 
behaviour of information measures. 

II. Region of convergence 

For any probability distribution with a finite number of prob- 
ability masses, Renyi entropy of order a exists for any a > 0. 
However, in the case of distributions with an infinite number 
of masses the problem of divergence appears. Obviously, H a 
converges for any a > 1 because Y^=\Pn < Y^=iPn = k 
Also, it is easy to see that if H ao (V) < oo then H a (P) < oc 
for all a > ao- Call 



a c (V) =inf{a : H a (V) < oo} 



(4) 



the (Renyi' s) critical exponent of the probability distribution 
V. Clearly, a c (P) < 1 and H a {V) = oo for all a < a c {P). 
It is also interesting to see what happens at a c . It turns out 
that H a< .(V) can converge or diverge here, depending on 
the asymptotics (tail) of the distribution. In other words, the 
(Renyi' s) region of convergence for the distribution V, defined 
by 

K(V) = {a : H a {V) < <x} (5) 

is of the form K(V) = (a c (V),oo) or TZ(V) = [a c (V),oo). 
Next we give examples of distributions with both kinds of 
convergence regions, for any a c € [0, 1]. In the following, 
notation x n ~ y„ means lim^oc x n /y„ £ (0, oo). 

Example 1: Consider a distribution V = (pi,P2, ■ ■ •) with 
exponentially decreasing tail p n ~ 2~™. Then for any a > 
the sum Y^n=i 2 -cm converges so that a c {'P) = and 
TZ(V) = (0, oo). 

Note that any distribution with a finite number of probability 
masses also has a c (V) — 0, but the convergence region is 
K(V) = [0,oc). 
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Example 2: Let V = (pi,f>2, • • •) be a distribution with 
p n ~ n _/3 , /3 > 1. Then, since the series 53n=i n " converges 
if and only if a > 1, it follows that X)^Li Pn converges if and 
only if a/3 > 1, So in this case a c (V) = ft" 1 and the region 
of convergence is TZ(V) = (ft" 1 , oo). 

Example 3: Consider a distribution V = (pi,p2, ■ ■ ■) with 
Pn ~ ^ log -2 ^ n, ft > 1. Now for any a < ft" 1 , 
Z~2n°=i Pn diverges because p™ ~ n""' 3 log -2 ™' 3 n is decreas- 
ing to zero strictly slower then n -1 . For a = /3 _1 we have 
p° ~ n" 1 log -2 ri and the corresponding sum converges lfl6l 
Theorem 3.29], as can be seen from the integral criterion for 
the convergence of series, namely 

f°° 1 f°° 1 

/ 5 — dx = I —~ du < oo. (6) 

Jc x\0g Z X J\ogc u 

So in this case a c (V) = ft" 1 and U{V) = [/3 _1 ,oo). 
The case a c (V) — 1 remains. 

Example 4: Consider a distribution V with p n ~ 
n, -1 log~ 2 n. Then — p„logp„ ~ n _1 log - n and therefore 
(again by the integral criterion) H(V) = oo so that 1Z(V) = 
(1, oo). (It can be readily checked that H(V) = oo implies 
H a (V) — oo for a < 1, because — p„ logp„ is bounded from 
above by p™ for all a < 1 and all sufficiently large n.) 

Example 5: For the last remaining case, consider V with 
Pn ~ n _1 log _3 n. Now — p n logp n ~ n _1 log~ n which 
implies H(V) < oo, but H a (V) = oo for a < 1 since is 
bounded from below by nT 1 . We conclude that, in this case 
a c {V) = 1 and TZ(V) = [l,oo). 

These examples illustrate that the critical exponent of a 
distribution is determined entirely by its asymptotic behaviour. 
Here is a slightly more precise statement. 

Proposition 1: Let V = {p\,p2, ■ ■ ■) and Q = ((71,52, ■ • ■) 
be two probability distributions. If p n ~ q n , i.e., if 
limn^ooPn/qn £ (0, oo) then a c (P) = a c (Q) and U(V) = 

n{Q). 

Proof: The proof follows immediately from a similar 
statement for series. Namely, if for the sequences of non- 
negative numbers x n and y n , lim^oo x n /y n £ (0, oo) 
then 53nt=i x n and 2~2n°=i e i tn er both converge or both 
diverge ifTTl . Now, since limn^oo p n /q n £ (0, oo) implies 
linin^oop^/q" £ (0,oo), this means that X^iPn an( l 
Z~2n°=i In either both converge or both diverge and so TZ(V) = 
Tl{Q)- (The case a = 1 should be treated separately but the 
proof is very similar.) ■ 

The following theorem establishes continuity of H a (V) 
with respect to a and characterizes its behaviour at a c . 

Theorem 1: For any probability distribution V over a count- 
ably infinite alphabet, H a (V) is a continuous function in a in 
its region of convergence. Furthermore, if a c (V) is the critical 
exponent of V and H ac {V) — 00 then lim Q _ VC(( ,+ H a (V) = 
00. 

Proof: The claim for a = 1 will be proven in Section 
IV. In (a c , 1) U (l,oo) it is enough to consider the function 
T^=iPn because it is strictly positive so log will preserve 
its continuity, and it is possible to divide by a — 1 because 
a ^ 1, Since all summands are continuous functions in a, 
their sum will also be continuous if it converges uniformly |fl6l 
Theorem 7.11], so let us check that it does. Assume first that 



Y^^iPn" < 00 • F° r a ^ a — a c' Pn ^ Pn c - Weierstrass' 
criterion [16 Theorem 7.10], for the uniform convergence of 
functional series, these are precisely the sufficient conditions 
for the uniform convergence of 2~2n°=iPn on [ a c,oo) and 
therefore this is a continuous function. (Weierstrass' criterion 
for the uniform convergence of series states that, if for the 
series 2~2n°=i Un ( x ) °f rea l" or complex-valued functions de- 
fined on some set E there exists a convergent series 2~2n°=i a ™ 
with |u„(x)| < a„,Vn, then the initial series converges 
uniformly and absolutely on E.) If X)riLiPn c = 00 then 
one can apply the same reasoning with any instead of 
a c , ao > a c , to establish continuity in 1Z(V). In this case 
it is left to prove that H a (V) has a vertical asymptote at the 
critical exponent. Assume, for the sake of contradiction, that 

\iTCL a ^ ac+ Y^ =1 Pn = C < 00 ■ J2n=iPn is a monotonically 
decreasing function of a, so Y^ =1 Pn < C for all a > a c . 
^ mce Y^=iPn c = 00 by assumption, there exists some N 
with 2~2n=iPn c > C- Observe Yl n =iPn- This is a finite sum 
so we have lim a _ ) . ac+ J2 n =iP% = E n =i liniQ^ Qc + p° = 
2~2 n =iPn c > ^ Ut this means that there is some ao > a c 
such that 2~2n=i Pn° > C an ^ so X^^Li Pn° > C' which is a 
contradiction. ■ 
Let us introduce one more concept related to the Renyi 
convergence region of a distribution. Let V denote the set of 
all probability distributions over a countably infinite alphabet, 

i.e., T = {(p 1 ,p 2 ,...) :p n > 0,X^!LiPn = 1} and let r K) 
be the set of all distributions with critical exponent a c . 

Remark 1: Throughout the paper, when we speak of e- 
neighborhoods, convergence, continuity etc., we always mean 
with respect to the total variation (or variational) distance 

oc 

dv(V,Q) = \\V-Q\\ 1 = J2\Pn-<ln\ (7) 

n=l 

where ||- \\ x is the familiar I 1 norm. 

Proposition 2: T(a c ) is dense in V, for any a c £ [0, 1]. In 
other words, V is the closure of r(a c ), T = T(a c ). 

Proof: The critical exponent of a distribution is deter- 
mined by its asymptotic behaviour. One can always change the 
asymptotics of a distribution and stay within distance e from 
the original distribution by changing its tail (p no ,p„ 0+ i, . . .) 
and taking n large enough so that this tail has sufficiently 
small weight. More precisely, let V = (pi,P2,---) be an 
arbitrary distribution and e > an arbitrary small number. 
Assume first that V has infinitely many probability masses 
and let uq be such that 

00 

n—no 

Let Q £ r(a c ) be a distribution with infinitely many probabi- 
lity masses and a critical exponent a c . Take (q no , Qn +it ■ ■ ■) 
and multiply it by a suitable constant to get (q' n , q' na +i, ■ ■ •) 
such that 

00 00 

E In = E W 

n—no n—no 

Now let <S = (si, S21 • • •) be a distribution defined by 

S = (pi,...,p no -i,q' m ,q' m+1 ,...). (10) 
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Clearly S G r(a c ), because s n ~ g„. Furthermore, 

oo 

i^-^iii = E 

n=l 



< 



E (W + kl) 

n— no 
oo 

E (p^ + O 



(ii) 



< e. 

Therefore, in e-neighborhood of V we have found a member 
of r(a c ). Essentially, this completes the proof of the claim, 
but when V has finite support the proof has to be slightly 
modified (in that case V has no tail and (0 fails). So let V = 
(px, . . . ,pn) be a distribution with finitely many probability 
masses and e > an arbitrary small number. Let Q G r(a c ) 
be a distribution with infinitely many probability masses and 
a critical exponent a c . Take (q no , q no +i, ■ ■ •)> n o > N, such 
that 



E 



(12) 



Now create another distribution S = (si, S2, ■ ■ •) as 

V-<5„, l<n<AT 
s n = ^ 0, N < n < n 

q n , n> n 



(13) 



where <5„ are such that p n — S n > and X^=i ^« = 
Y^n=n 1 n - A 8 ain ' "5 G r(a c ) because s n ~ q„. Furthermore, 



i^-^iii = E i p « - s « 

n=l 
JV oo 

= E Sn + E 9,1 

n— 1 n—no 

< e. 



(14) 



Therefore, in e-neighborhood of V we have found a member 
of r(a c ). The proof is now complete. ■ 
Proposition 3: r(a c ) is convex in T, for any a c € [0, 1]. 
Proof: We need to show that V,Q £ T(a c ) implies A'P + 
(1 - A)Q G T(a c ) for any A e (0, 1). This is straightforward. 
Asymptotic behaviour of XV + (1 — X)Q is determined by T 5 
or Q, whichever has heavier tail, so the critical exponent is 
unchanged. More precisely, suppose 

Pn 



lim 

n^oo q n 



c < OO 



(15) 



(if this fails then lirrin^oo q n /p n < oo and the proof is carried 
out by just interchanging p n and q n below). Denote 



t, 



Ap„ + (1 - X)q n 



Then we have that 



lim — = Ac + 1 - A 

n^oo q n 



(16) 



(17) 



i.e., t n ~ q n and so by Proposition [T] a c (T) = a c (Q) which 
means that T G r(ct c ). ■ 



III. Continuity properties of Renyi entropy 

As for the continuity in the argument V, it turns out that 
Renyi entropy behaves differently when a > 1 and when a < 
1, unlike its behaviour in the case of finite alphabets. 

Theorem 2: The Renyi entropy H a {V) is a continuous 
function in V for a > 1 and discontinuous for a < 1. 

Proof: The discontinuity for a < 1 can be established 
as a corollary to Proposition |2] Take some a c > a. In any 
e-neighborhood of V there are always members of T(a c ) 
so we can find a sequence of distributions V n — > V with 
T n G r(a c ), Vn. In this case, H a (V n ) = oo for all n which 
clearly means that H a is discontinuous. The discontinuity 
for a = 1, i.e., the discontinuity of Shannon entropy fl4l . 
ifTSl can be proven in a similar way. One can construct a 
sequence of distributions whose entropies diverge by changing 
the asymptotics of the original distribution and staying within 
a small distance from it. When a > 1, however, Renyi entropy 
is a continuous function. Observe that 



H a {V) 



and it is enough to prove the continuity of 




(18) 



(19) 



For a > 1 this defines a norm of the vector (pi,p2, •••) in the 
vector space £ a (the space of sequences of real numbers with 
(fT9l converging). It is well known that norm is a continuous 
function [19], i.e., for any sequence of distributions V n with 
\\P-V n \\ a -> we must have \\V n \\ a -> ||P|| a , which 
follows from the fact that 




IPII 



(20) 



Now continuity with respect to the total variation distance, 
which we are interested in, can be established by observing 
that 

WV-V^ > \\V-V n \\ a . (21) 

This is shown by considering ||- || norm in M™. One 
writes some vector (xi, . . . , x n ) as (x%, 0, . . . , 0) + • ■ • + 
(0, . . . , 0, x n ) and by triangle inequality it follows that 



Ei 



< (H a )-+--- + (|a:„r) - = VN. (22) 



i=l 



Taking limits when n — ► oo yields (f2TT >. ■ 
The following theorem gives more insight into the discon- 
tinuity of H a (V) for a < 1. Its special case, for a = 1, is 
proven in 1 14 1. 

Theorem 3: Let a G (0, 1] and let V be a probability 
distribution over a countably infinite alphabet. Then there 
exists a sequence of distributions V n converging to V with 
respect to the total variation distance, such that 



lim H a (V n )=H a (V)+r 

n— >oo 

for arbitrary r £ [0, oo]. 



(23) 
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Proof: The proof for a — 1 can be found in |14|, 
so assume that a £ (0,1)- The case r = oo is taken 
care of by taking V n £ T(a c ) for some a c > a, as in 
the proof of Theorem [2] In that case H a (V n ) = oo,Vn, 
and so lirrin^oo H a (V n ) — oo. The case r = is trivial, 
take for example V n — V (but nontrivial sequences with 
lirrin-^oo HaiVn) = H a {V) can also be constructed). So let 
r £ (0, oo). We will construct a sequence of distributions 
V„ = (pi(n),J?2(n)) • • •) converging to V and such that 



H a (V n ) = H a (V) + r 



(24) 



for all n. If b is the base of the logarithm in (01, this is 
equivalent to 



oo 



(25) 



Since a £ (0, 1) and r £ (0, oo), we have b^~ a)r £ (1 , oo 
It follows that the righthand side of (|25l l, call it h, satisfies 



z=l 



(26) 



Therefore, we want to construct a sequence 7^ with 
Y^LiPi(n) ~ h> for arbitrary given h satisfying d26l >. The 
construction is as follows 



Vn = (Pl(n),P2(n), ■ • ■) 

= (Pi) • • • ,Pn, -B( n ), S(„)<7(„), i?(„)g (n ), . . .). 



(27) 



In other words, we keep the first n probability masses of "P and 
replace the tail of V with the tail of a geometric distribution. 
According to (|25T > and (l27l i. £?(„) and g(„) should satisfy the 
following: 

1 



i=0 



= B 



(n) 



1 



9(r 



OO 



(28) 



Pi 



i=n-\-l 



and 



n oo n - 

E^+E B w<) - Erf + ^)t— - 

i=l i=0 i=l 



ft,. 



(29) 



We need to verify that such £?(„) and <?(„) exist, i.e., that 
the above two equations have non-negative solutions. Express 



B( n ) from 



and insert it into d29l to get 



S(n) = (1 - Pi 
i—n-\-l 



(30) 



(31) 



l ~1( n ) (T,Zn+l Pi) a 

Now we need to check that the above equation has a solution 
for q(„) £ (0,1) and for all n > n for some n . To 
show this observe that the lefthand side is a continuous and 
monotonically increasing function in starting from 1 and 



going to oo when £ (0, 1). This means that (fJTJ will have 
a solution whenever the righthand side is greater then 1. This is 
indeed the case for all n large enough. Namely, the numerator 
on the righthand side of OTb tends to h — P? as n ~* 00 
which is by (l26T i strictly positive, and the denominator tends 
to zero so the entire righthand side tends to oo and is therefore 
greater then 1 for n > no for some uq. This means that, for all 
n (large enough), there exist _B(„) > and qi n ) £ (0,1) such 
that (f28b and j29l hold. Thus we have found a sequence (V n ) 
with H a (V n ) — H a (V) + r for arbitrary r £ (0, oo), and, 
furthermore, from d27l > and d28l ) it is easy to see that V n — > V 
when n — > oo with respect to the variational distance. We 
should mention that this proof assumes that V has infinitely 
many probability masses and it needs to be modified when 
this is not true. This is not hard to do but we omit it here (see 
the proof of Proposition [2] for a similar construction). ■ 
Constant r in the previous theorem was taken to be non- 
negative. This is necessary, as the following theorem shows. 

Theorem 4: Let V n ,V be probability distributions over a 
countably infinite alphabet. If V n — > V with respect to the 
variational distance, then liminfn^oo H a (V n ) > H a (V). 

Proof: For a > 1, H a is continuous and the claim is 
obviously true. Suppose a < 1. Let V n = (pi(n),P2(n) ■ • ■) 
and V = (px,p 2 ,...), and let V n K) = (jp x („),... ,p K (n)), 
V^ K ^ — (p%,...,pk). V n K ^ and are obviously not 

probability distributions but that does not affect the proof. For 
example, H a (V n K ^) are well-defined. Now, if V n — > V then 
also V n K) 



V K when n — > oo. It follows that 

lim H a {vW) = H a (vW) 



(32) 



because Renyi entropies are continuous when the alphabet is 
finite. Now, since 



oo K 

/.PiM - / .PiM 
i = l i=l 



(33) 



or (for a < 1) 



(K) 



(34) 



(35) 



(36) 



H a (V n ) > H a (V n K) ) 
it follows from ( |32l that 

liminf HJV n ) > H a (V [K) 

n—hoo 

This is true for all K and so 

liminf H a {P n ) > lim H a (P 

n— >oo K— >oo 

= H a (T). 

The case a = 1 is completely analogous. ■ 
The property stated in Theorem [4] is usually referred to as 
lower-semicontinuity. It is a well known property of Shannon 
entropy lfT8l . lTT~5l and is now generalized to all Renyi en- 
tropies. Also, the above proof is much simpler, in our opinion, 
then those reported before for Shannon entropy. 

We mention in this context one more property of H a . 

Theorem 5: H a (V) is a n-convex function in V for a < 1 
and is neither D- nor U-convex for a > 1. 

This is proven in [3| and those arguments easily transfer to 
the infinite case. 
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IV. The limiting case a ->■ 1 

Now let us consider what happens at the point a = 1 . For a 
fixed finite alphabet, Renyi entropy is defined at this point (2) 
so as to preserve continuity (in a) [1|. There are several issues 
in the case of an infinite alphabet which make continuity more 
difficult to prove then in the finite case. First, H(V) might 
be infinite (see Example |4j, and in that case it needs to be 
checked how H a (V) behaves as a — > 1+. Next, it is possible 
that H(V) < oo but H a (V) — oo for all a < 1 (see Example 
[5]) in which case clearly a — > 1 needs to be separated into two 
cases a — > 1— and a — > 1+. And finally, even without these 
two situations, one needs to be careful when interchanging 
limiting operations because infinite sums are involved. 

Theorem 6: If a c (V) < 1 then iim^i H a (V) = H(V). If 
a c (V) = 1 then lim Q _>i + H a (V) = H(V). 

Proof: Assume first that H(V) < 00. Then we have 

lo sE, c 



lim HJV) = lim 



Pn 



1+ 1 — Ol 
lim Q ^ 1+ J2n=lPn l °SPn 



lim n 



L Pn 



n=l ^ n 



(37) 



(38) 



(39) 



(40) 



(41) 



= Jf(7>). (42) 

Let us justify the above steps. (f37T > is by definition. ( 1381 
follows from LHopital's rule. A sufficient condition for its 
application lfl6l Theorem 5.13], is the existence of the limit of 
the ratio of the derivatives which will follow from subsequent 
equations and our assumption H(V) < 00. The equality ( |39l 
is justified by the fact that the limit of the denominator is 
not zero. d40b follows from uniform convergence of the series 

I2n=iPn^ &Pn and Z)£Li Pn on I 1 ! 00 )- Tnis is established 
easily by Weierstrass' criterion lfl6l using the following facts 
(valid for a > 1) 



Pn logPn < ~Pn logp„, P° < P 

OO 

Pn log p n = H(V) 



00 

E 

71=1 



71=1 



Pn = 1. 



(43) 
(44) 



Steps (f4Tb and (l42l are obvious. If a c (P) = 1 then clearly the 
above limit is the only one that makes sense. If a c (V) < 1 then 
one can take any oq £ (a c , 1) and repeat the above arguments 
about uniform convergence on [ao, 00) and then the claim is 
true when a — > 1 (all the other steps are identical). It remains 
to be shown that lim Q _ > i + H a (V) = 00 when H(V) = 00. 
To prove this we define a sequence of distributions 



Qn = (<7l(n), ■ • -,<?«(«)) = (Pi, • • ■ 



i—n 



Pi)- (45) 



We have limn^oo Q„ — V in the sense that variational 
distance between Q„ an V tends to zero. Also 



lim H{Q n ) = oa = H{V). 



(46) 



This follows from the fact that Shannon entropy is lower- 
semicontinuous ED, namely liminf^oo H(Q n ) > H{P) (in 
general, however, Q n — > V does not imply H(Q n ) — > H(V) 
[14]). Now observe that, for a > 1, 

(00 \ a 00 
i—n / i—n 

(see d22l~). Now (gUl and (03 give 

n OO 

»=i »=i 

which gives 

n 00 

log^^ (n) >log^^ (49) 

i=l i=l 

and finally 



H a (Q n )<H a (V). 



(50) 



This is true for any a > 1 and all n > 1. Taking limits on 
both sides we get 



lim H a 

2->44- 



< 



lim HJV) 



(51) 



which holds for all n. Now, since H(Q n ) < 00, by the first 
part of our proof the lefthand side is equal to H(Q n ). And 
since H(Q n ) is unbounded ( l46| >, the righthand side must be 
unbounded too, i.e., lim Q ^i + H a (V) = 00. This completes 
the proof of the theorem. ■ 
Here are also two, potentially useful, restatements of the 
above theorem (just omit the logarithms). For any sequence 

(.Pl,P2, ■ ■ ■), Pn > 0, J2n=l Pn = h 




n 



or 



lim 



\r\ 



n 



Pn 



(52) 



(53) 



where ||- || denotes the £ a norm, as usual. 

Let us exemplify one consequence of these results. Let V 
be some distribution over a countably infinite alphabet such 
that H(V) < 00. In [14], it is shown that there always 
exists a sequence of distributions V n such that V n — > V, 
but H(V n ) H(V). Actually, it is shown QU Theorem 
2], that for any c > 0, there is such a sequence V n so 
that linin^oo H{V n ) = H(V) + c (this is a special case of 
Theorem 3 above). Using this and Theorem [6] one concludes 
that lim^oo lim a ^ 1+ H a (P n ) = lim^oo H (P n ) need not 
equal H{V). On the other hand, Theorems |2] and [6] guarantee 
that Km c ,_ >1+ limn^oo H a (P n ) = lim Q ^i+ H a (V) = H(V) 
for any sequence V n — > V. We summarize this in the form of 
a theorem whose proof we have essentially described. 

Theorem 7: Let V = (pi,P2, • ■ ■) be a probability distri- 
bution. Then, for any r £ [0, 00], there exists a sequence od 
distributions V n converging to V with respect to variational 
distance, such that 



lim lim H a (T n ) = H(T) 



r, 



(54) 



6 



but for any such sequence 



lim lim H a (V„ 

a— >1 + n— J-oo 



H{V). 



(55) 



In applied sciences one usually freely interchanges limiting 
operations, such as limits, sums, integrals, derivatives etc. But 
one must be careful when doing this, as such rules do not 
always apply. The above is an illustrative example of this, 
involving quantities with physical meaning. 



V. The limiting case a ->• oo 

There is one more interesting limiting case for Renyi 
entropies, namely a — > oo. It is known ifTUl , 11201 that 



lim H a (Q) = — log max q n 



(56) 



when Q has finite support. It is easy to prove that this remains 
true for any (q±, . . . , q n ), > 0, with q% not necessarily 

equal to 1. The same is true in the infinite case, the proof is 
just a little more subtle. 

Let V = (pi,p2, ■ ■ ■) be a probability distribution. First 
observe that 

^ oo 

lim H a (V)= lim logYV Q (57) 

a— ► oo a— ^oo 1 — (Y c — ' 

i=l 



lim 



i=lPf 



(58) 



> l im -tosma^PiS <= iPf (59) 



i=l Pi 

— log max pi 



(60) 



where d58] > is by L'Hopital's rule and d59t by lower bounding 
— \ogp„ . Now to prove that this is also an upper bound, write 



X>?>X>? 



(61) 



which is true for all n and all a > 1. Let V„ — (pi, ■ ■ ■ ,p n )- 
(V n is not a probability distribution but that does not affect 
the proof.) This gives 
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H a (V) < H a (P n ), 



(62) 



so that 



lim H a (T) < lim H a (V n 



log max pi . 

i£{l,...,n} 



(63) 



Since (1631 holds for all n, it follows that 

lim H a (V) < — log max ^. (64) 

a — ►oo i 

Together with d6*0j l this yields 

lim H a (V) — — log max p.;. (65) 



